A fixed point iteration for computing the matrix logarithm 
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In various areas of applied numerics, the problem of calculating the logarithm of a matrix A 
emerges. Since series expansions of the logarithm usually do not converge well for matrices far away 
from the identity, the standard numerical method calculates successive square roots. In this article, 
a new algorithm is presented that relies on the computation of successive matrix exponentials. 
Convergence of the method is demonstrated for a large class of initial matrices and favorable choices 
of the initial matrix are discussed. 
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I. INTRODUCTION 

Calculating the logarithm of a quadratic matrix A 
may be a difficult problem. Of course, for problems of 
moderate size it is usually not prohibitive to calculate 
its logarithm (if existent) by complete diagonahzation 
However, there may exist several arguments against 
this method: In some cases, A might not be diagonaliz- 
able. Also, if A{t) and \xiA{t) are known for some given 
time t, one might want to have an efficient solution for 
\T\A{t + dt) where dt is small. The approximate calcu- 
lation of eigenvalues and eigenvectors of A with Arnoldi 
methods [4I would enable one to follow a time-dependent 
spectrum efficiently. Unfortunately, these methods only 
yield a part of the spectrum. In such cases, direct diag- 
onahzation is probably not the best method to compute 
In A. 

If the matrix A is near the identity matrix, one may 
truncate the Taylor series expansion of the logarithm 



ln[l - (1 - A)] 



E 



{l-AY 



(1) 



However, if this is not the case, convergence may become 
extremely slow or even fail, such that such a series expan- 
sion is of little practical use. The standard resolution to 
this problem is to bring A near the identity by repeatedly 
computing its square root 



lnA = 2'=lnAi/2'" 



(2) 



If the square-roots are only approximated, this can be 
adapted to an efficient method 3, 4]. 

The present work undertakes a different step to de- 
crease the computational burden. Instead of calculat- 
ing successive square roots, a fixed-point iteration is pre- 
sented that requires the calculation of successive expo- 
nentials. The article is organized as follows: After dis- 
cussing the fixed point iteration scheme in[TTl in section 
mil further improvements are discussed. In section IIVI 
a numerical implementation is discussed, and its perfor- 
mance is analyzed inlVl 



II. FIXED POINT ITERATION SCHEME 



Consider the iteration formula 



(3) 



If one regards Xi as real numbers, it is quite straight- 
forward to see that for any Xo > the above iteration 
formular converges to 



lim Xri 



In A. 



(4) 



For example, it is immediately evident that X^o = In A is 
the only fixed point of g{x) in [31 In addition, by investi- 
gating that [(/'(Xcx))! = < 1 one can conclude that this 
fixed point is stable and thus that one has a contractive 
map which converges to In a for all positive numbers. 

Evidently, one can also consider the iteration ^ for 
matrices X„ and A (assumed to have a well-defined log- 
arithm). Clearly, one still has fixed points at the loga- 
rithms g{ln{A) + 2kTri) = \n{A) + ikiri. The question is 
under which conditions these fixed points are attractive, 
i.e., for which matrices Xi the difference (with respect to 
some norm) to In A becomes smaller with each iteration. 

For simplicity I will restrict myself to an initial matrix 
ATo that commutes with A. It is straightforward to show 
that 



[^^+1,^] =0, 



(5) 



and therefore if [Aq , A] = the iteration ([3]) defines a 
series of mutually commuting matrices. 

With inserting Xi = ln(v4) -I- one has (using 
[^,A,]=0) 



-A, 



A,; 1 . 



(6) 



From now on I will assume that A, is a normal matrix. 



A,:, A 



= 0. The spectral theorem implies the 



I.e. 

existence of an orthonormal basis, within which A^ has 
diagonal form (with possibly complex eigenvalues A^''*). 
Therefore, the iteration ([3]) transforms the eigenvalues of 
the deviation matrix A^ according to ^ . 

If the eigenvalues of A^ are real, one can deduce from 



-f A- I > 



-00 < X < 00 



(7) 
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that the next deviation matrix A^+i will be positive 
semidefinite. Also, one obtains for the operator norm (for 
a positive semidefinite matrix this is simply its largest 
eigenvalue) 



I A. 



= max 



X 



1 



(8) 



represent the difference to any specific branch of the log- 
arithm without any difference. 

With the Banach fixed point theorem one can then 
show that the iteration ([3]) will converge towards In A. In 
the following, some suitable choices for an initial matrix 
to the algorithm will be discussed. 



Then, one can deduce from 

e~^ + A-l < lAI 



A > -1.256. 



(9) 



that the norm of A^+i will be smaller than the norm 
of Ai if Ai is positive semidefinite. In other words, for 
any matrix Xq = InA + Aq with a self-adjoint initial 
deviation matrix Ag and [Aq , A] = the iteration ^ is 
contractive and will converge to \nA. Note that (under 
the precondition that [A , Aq] = 0) Aq = AJ does not 
imply that A ^ , but A = A'' imphes Aq = AJ. 

Complex eigenvalues X = x + iy (in case of A.; being 
normal) are also transformed according to ^ . Demand- 
ing that the modulus of all eigenvalues of A should be- 
come smaller with each iteration, one obtains a region 
of convergence V. If all eigenvalues of the A-matrix are 
contained within 



V = {X = x + iyeC 



-A 



+ A-1| <|A|'} (10) 



convergence is assured, see also figure [H Note that the 




III. ALGORITHMIC OPTIMIZATIONS 

Beyond performing scaling transformations InA = 
ln(A/cr) +ln((T)l, one has further options to improve the 
algorithmic performance. 

It is evident from ^ and figure [T] that a good initial 
guess for the matrix logarithm may save a lot of compu- 
tation time. Such a guess can be made if some bounds 
on the eigenvalues of A (and hence also to those of In A) 
are known. Such bounds can for example be cheaply 
extracted from Gershgorins circle theorem (which is es- 
pecially useful if the matrix A is diagonally dominant) 
or they may be already known from the definition of 
the problem. Then an initial matrix with eigenvalues 
close to InA can be constructed from the linear Taylor 
approximation that could be optimized for the regime 
[In(Amin), lii(Ainax)]- Assumiug an approximately uni- 
form distribution of eigenvalues, one could for example 
consider 



^0 - 



In 



- 1 



-A. (11) 



Other choices could include some adapted polynomials of 
A. 

In addition to a good guess for an initial matrix one 
may also think of optimizations of the algorithm itsself. 
For example, the iteration 



n+l 



X„ 



[Ae 



-x„ 



(12) 



has in the 1-dimensional case near its fixed point Xoo — 
InA better convergence properties than ([3]). However, 
the iteration does not converge far away from the so- 
lution. Therefore, it could be used as an optional last 
refinement step after the conventional iteration ^ has 
converged with sufficient accuracy. 



FIG. 1: Density plot of /(A) = in (\e-^ + \- -ln(|A|^). 

The origin is in the center and both real part of A (horizontal 
axis) and imaginary part (vertical axis) range from —tt . . . + 
TT. The isolines represent values of /(A) = {0,0.5,1.0,1.5}, 
respectively, such that the leftmost isoline borders the region 
of sure convergence (right, green to blue colours). In any case, 
convergence is ensured if the eigenvalues of A are not too far 
from the real axis and if their real part is positive. 

ambiguity of the logarithm function does not pose a ma- 
jor problem here, since the A-matrix can be chosen to 



IV. A NUMERICAL ALGORITHM 

The fixed-point iteration Q requires the calculation 
of the exponential of the iterates. The associated com- 
putational burden can be reduced by exploiting that the 
proposed fixed-point iteration produces a series of mutu- 
ally commuting matrices, if initialized properly. There- 
fore, two successive exponentials can also be computed 
iteratively, which has the advantage that the norm of the 
matrix to be exponentiated in each step does not become 



3 



too large. In this case, the inverse scahng and squaring 
method, which is based on 



[exp {B/V)] 



2^ 



(13) 



is known to produce good results with a modest number 
of matrix multiplications [H, @ . For example using a k- 
th order Taylor approximant the exponential (jl3p can be 
calculated with just fc+ j — 1 matrix multiplications. The 
algorithm can be summarized as follows 

• Determine an initial matrix 

— with eigenvalues close to those of In A 

- with [Xo ,A]=Q 

• set lo = exp(— Xq) 

• iterate 

Xn+l = AYn — 1 + Xn 

Yn+i = r„exp{-(v4r„-l)} 

until convergence is reached 
(e.g. \\Xn+i\\\\Xn+i-Xn\\<e) ^ 
or a maximum number of iterations has been ex- 
ceeded 

• refinement step [optional]: 



Xfin — Xn ~ 2 1-^ 



Ae 



Note that F„ will converge to the inverse of A (although 
there exist by far more efficient methods to achieve this) . 
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FIG. 2: Number of matrix multiplication required to achieve 
convergence to In A for different stopping thresholds e. Hollow 
symbols include all matrix multiplications (including calcula- 
tion Taylor approximants) , whereas full symbols refer to the 
multiplications required by the fixed point iteration scheme. 
Error bars correspond to one standard deviation sampled over 
1000 random matrices. For the chosen test matrices, the al- 
gorithm shifts the computational burden towards matrix ex- 
ponentiation, and the total number of matrix multiplications 
increases approximately linearly with the required precision. 
Around e « 1 ■ 10"^*^ the scaling breaks down due to numer- 
ical roundoff errors (error bars omitted) , compare also figure 
131 It is also visible that the dependence on the dimension of 
the matrix is rather weak. 



V. PERFORMANCE ANALYSIS 

In order to estimate the performance of the algorithm, 
some sample matrices have been generated. For different 
matrix dimensions, 1000 matrices have been randomly 
generated. Some test matrices had a uniform eigenvalue 
distribution in the interval [1 • 10~®,1], others were ex- 
ponentially distributed according to p\(x) — \e^^'-^ also 
in the interval [1 • 10~*,1]. The diagonal matrix gener- 
ated by these eigenvalues has been transformed into a 
non-diagonal test matrix by applying random orthogo- 
nal transformations A = Q"^ Aj:,Q with Qij = Sij — 2viVj 
and J2i — 1- All iterations were initialized with Xq — 
2A — [I + ln(2)]l, which is not necessarily the optimum 
choice. For all norm calculations, the Frobenius matrix 
norm has been used. The iteration for the logarithm 
used as a stopping criterion ||^n-i-i — Xn\\ < e, and 

the calculation of the exponential of a matrix B used 
||-B||" < £• To calculate the matrix exponential, the scal- 
ing and squaring method was used in combination with 
truncated Taylor approximants Note that the effi- 
ciency of the scaling and squaring method can in principle 
be increased by approximately 50% if instead of Taylor 



approximants. Fade approximations are used The 
number of matrix multiplications to obtain convergence 
was therefore counted with and without including those 
required by computing the Taylor approximants to the 
matrix exponential, see figure [2l Whereas the number of 
total required matrix multiplications increases approxi- 
mately linearly with e, it is visible that the algorithm 
shifts the computational burden towards matrix expo- 
nentiation, since the number of remaining matrix multi- 
plications is approximately constant. The total number 
of matrix multiplications is competative with much more 
sophisticated algorithms existing in the literature Q. 

In order to get an estimate on the error of In A 
one can perform the inverse operation, i.e., exponen- 
tiate the result (with a much better precision) and 
compare it with the original test matrix. Thus, from 
1 1 exp (In A -|-A)— one has an estimate for the 
actual error ||A||, see figure [31 It becomes visible that 
for e > 10~^° the accuracy of the computed logarithm is 
much better than one might have expected. In addition, 
with a single application of (fT^ . the final error of the 
algorithm can be reduced by orders of magnitude. 
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VI. SUMMARY 
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FIG. 3: Scaling of the error estimate depending on the stop- 
ping criterion e. Each point displays the median (out of 
1000 random samples) and error bars give the 95% confi- 
dence bounds for the median. Above a threshold criterion 
of e > 1 • 10"^*^, the error estimate scales approximately lin- 
early -with e. Applying only a single iteration of (|12p with 
10"^^ the final error estimate can be reduced ex- 
tremely (patterned symbols). For small e, a saturation due 
to finite machine precision is visible. 



The fixed-point iteration presented here has some ad- 
vantages in comparison to standard algorithms 0, 
One of the most important advantages is the ease of im- 
plementation. Given the simplicity of the inverse-scaling 
and squaring algorithm for computing the matrix ex- 
ponential, the user basically has to implement matrix- 
matrix multiplication. As has been demonstrated, the 
fixed point iteration guarantees fast convergence if a good 
initial guess is given. For some special test problems, the 
convergence is competative with state of the art algo- 
rithms. With some kno'wledge on the spectrum of the ma- 
trix of "which the logarithm should be taken, the conver- 
gence can be tuned to the specific problem. In any case, 
the presented iteration can be used as a final step -with a 
lo-w-precision result produced -with established methods 
ii. 

In the future, it "would be interesting -whether the iter- 
ation algorithm can be adapted to increase convergence 
for a bad initial guess. 



VII. ACKNOWLEDGEMENTS 

The author gratefully acknowledges financial support 
by the DFG grant # SCHU 1557/1-2. 
* schallerOtheory . phy . tu-dresden . de 



[1] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. 
Flannery, Numerical Recipes in C, Cambridge University 
Press, Cambridge, (1994). 

[2] R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK 
Users' Guide: Solution of Large-Scale Eigenvalue Prob- 
lems with Implicitly Restarted Arnoldi Methods, SIAM, 
http : //www . caam . rice . edu/sof tware/ARPACK , (1998) . 

[3] S. H. Cheng, N. J. Higham, C. S. Kenney, and A. J. Laub, 
Approximating The Logarithm Of A Matrix To Specified 
Accuracy, SIAM Journal on Matrix Analysis and Applica- 
tions 22, 1112-1125, (2001). 



[4] J. R. Cardoso and C. S. Kenney and F. Silva, Computing 
the square root and logarithm of a real P-orthogonal ma- 
trix, Applied Numerical Mathematics 46, 173-196, (2003). 

[5] N. J. Higham, The scaling and squaring method for the 
matrix exponential revisited, SIAM Journal on Matrix 
Analysis and Applications 26, 1179-1193, (2005). 

[6] C. Moler and C. V. Loan, Nineteen Dubious Ways to 
Compute the Exponential of a Matrix, Twenty-Five Years 
Later, SIAM Revie-w 45, 3, (2003). 



